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Abstract 

We will use the factors obtained by correspondence analysis to find biclustering 
of a contingency table such that the row— column cluster pairs are regular, i.e., 
they have small discrepancy. In our main theorem, the constant of the so-called 
volume-regularity is related to the SVD of the normalized contingency table. Our 
result is applicable to two-way cuts when both the rows and columns are divided 
into the same number of clusters, thus extending partly the result of [7] estimating 
the discrepancy of a contingency table by the second largest singular value of the 
normalized table (one-cluster, rectangular case), and partly the result of [5] for 
estimating the constant of volume-regularity by the structural eigenvalues and the 
distances of the corresponding eigen-subspaces of the normalized modularity matrix 
of an edge- weighted graph (several clusters, symmetric case). 
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1 Introduction 



A typical problem of contemporary cluster analysis is to find relatively small 
number of groups of objects, belonging to rows and columns of a contingency 
table which exhibit homogeneous behavior with respect to each other and do 
not differ significantly in size. To make inferences on the separation that can 
be achieved for a given number of clusters, minimum normalized two-way cuts 
are investigated and related to the SVD of the correspondence matrix. 
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Contingency tables are rectangular arrays with nonnegative, real entries. One 
example is the keyword-document matrix. Here the entries are associations 
between documents and words. Based on network data, the entry in the ith 
row and jth column is the relative frequency of word j in document i. Latent 
semantic indexing looks for real scores of the documents and keywords such 
that the score of a any document be proportional to the total scores of the 
keywords occurring in it, and vice versa, the score of any keyword be pro- 
portional to the total scores of the documents containing it. Not surprisingly, 
the solution is given by the SVD of the binary table, where the document- 
and keyword-scores are the coordinates of the left and right singular vectors 
corresponding to its largest non-trivial singular value which gives the constant 
of proportionality. 

This idea is generalized in [10] in the following way. We can think of the 
above relation between keywords and documents as the relation with respect 
to the most important topic (or context, or factor). After this, we are looking 

for another scoring with respect to the second topic, up to k (where A: is a 
positive integer not exceeding the rank of the table). The solution is given by 
the singular vector pairs corresponding to the k largest singular values of the 
table. 

If a scoring system is endowed with the marginal measures, the problem can 
be formulated in terms of correspondence analysis and correlation maximiza- 
tion. The problem is solved by the SVD of the correspondence matrix (nor- 
malized contingency table), where the singular vector pairs are also trans- 
formed, see [4]. In this way, instead of scores, the documents and keywords 
have /c-dimensional representatives, based of which further investigations, spa- 
cial representation, or biclustering can be performed that finds simultaneous 
clustering of the rows and columns of the table with densities as homogeneous 
as possible between the keyword-document cluster pairs. 

The problem is also related to the Pagerank (see [11]) and to microarray 
analysis (see [12]) when we want to find clusters of the rows and columns of 
a microarray, simultaneously. Here rows correspond to genes and columns to 
difi'erent conditions, whereas the entries are expression levels of genes under 
specific conditions. We also look for a bipartition of the genes and conditions 
such that genes in the same cluster equally (not necessarily weakly or strongly) 
influence conditions of the same cluster. 

In Section 2 we deal with the singular value decomposition (SVD) of a cor- 
respondence matrix. In Section 3 we relate it to normalized two-way cuts of 
the contingency table, while in Section 4 the constant of volume-regularity 
of row-column clusters pairs is estimated by means of the SVD. Section 5 is 
devoted to discussion, application and possible extension to directed graphs. 
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2 SVD of contingency tables and correspondence matrices 

Let C be a contingency table on row set Row = {1, . . . ,n} and column set 
Col — {1, . . . , m}, where C is n x m matrix of entries Cy > 0. Without loss 
of generality, we suppose that there are not identically zero rows or columns. 
Here Cij is some kind of association between the objects behind row i and 
column j, where means no interaction at all. 

Let the row- and column-sums of C be 

m n 

drow,i = ^Cij (i = 1, . . . , n) and dcoij = X] ^ij (i = 1> • • • > ^) 
j=i i=i 

which are collected in the main diagonals of the n x n and m x m diagonal 
matrices T>row and Deo;, respectively. 

For a given integer I < k < min{n, m}, we are looking for /c-dimensional 
representatives ri, . . . , r„ of the rows and Ci, . . . , c^^ of the columns such that 
they minimize the objective function 

n m 

i=l j=l 

subject to 

n m 

^ ] drow,i^i^i Ifc) ^ ^ dcol^jCjCj Ifc. (2) 
i=l j=l 

When minimized, the objective function favors /c-dimensional placement of 
the rows and columns such that representatives of highly associated rows and 
columns are forced to be close to each other. As we will see, this is equivalent 
to the problem of correspondence analysis. 

Indeed, let us put both the objective function and the constraints in a more fa- 
vorable form. Let X be the nxk matrix of rows rf , . . . , r^; let Xi, . . . , e R" 

denote the columns of X, for which fact we use the notation X = (xi, . . . , x^.). 
Similarly, let Y be the m x k matrix of rows cj , . . . , c^; let yi, . . . , y^ G M™ 
denote the columns of Y, i.e., Y = (yi, . . . ,yk)- Hence, the constraints (2) 
can be formulated like 
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With this notation, the objective function (1) is 



m 



Qk ^ ^ ^ ^ Q? W^i ^ ^ ^row.i II -^^t II ~^ ^ ] ^coi,7' 11^7 1 1 5Z 5Z 

^2k- trX^CY = 2fc - tr (Dj/^X)^(D,-V^CD;„f )(D;/fY), 

(3) 

where the matrix Ccorr — ^raw^^^co/^ correspondence matrix (normal- 
ized contingency table) belonging to the tabic C, see [4]. If we multiply all the 
entries of C with the same positive constant, the correspondence matrix Ccorr 
will not change. Therefore, without the loss of generality, J2^=i Sjli = 1 
will be supposed in the sequel. The correspondence matrix has SVD 

r 

Ccorr = SiViuJ, (4) 

i=l 

where r < min{n, m} is the rank of Ccorr, or equivalently (since there are not 
identically zero rows or columns), the rank of C. Here 1 = si > S2 > ■ ■ ■ > 
Sr > are the non-zero singular values of Ccorr, and 1 is a single singular value 
if Ccorr, or equivalently, C is non-decomposable (CC^ is irreducible). In this 
case vi = {^drow,i, ■ ■ ■ , ^Jdrow^Y ui = {^Jdcoi,i, ■ ■ ■ , \Jdcoi,my ■ 

Note that the singular spectrum of a decomposable contingency table can be 
composed from the singular spectra of its non-decomposable parts, as well as 
their singular vector pairs. Therefore, in the future, the non-decomposability 
of the underlying contingency table will be supposed. In this way, the following 
representation theorem for contingency tables can be formulated. 

Theorem 1 Let C he a non-decomposable contingency table with SVD (4) of 
its correspondence matrix Ccorr- Let k <r be a positive integer such that Sk > 
Sjt+i. Then the minimum of (1) subject to (2) is 'ik — Yji=i Si and it is attained 
with the optimum row representatives r^,...,r* and column representatives 
c^, . . . , cj^, the transposes of which are row vectors o/X* = D~^^(vi, . . . , v^) 
and Y* = Dco/'^i^i, ■ ■ ■ , ^k), respectively. 



PROOF. In view of (3), we have to maximize 

tr(D^/^X)^C_(D;/fY) 

under the given constraints. Separation theorems for the singular value decom- 
position (see e.g., [1] and [13]) are applicable, yielding the required statement. 



The vectors r^, . . . , r* and c^, . . . , cj^ giving the optimum in the above theorem 
are called optimum k-dimensional representatives of the rows and columns. 
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while the transformed singular vectors D^^^^Vi, . . . , D^^^^Vjk and D^^; Ui, . . . , 
D^^y^Ufc are called vector components of the rows and columns taking part in 
the /c-dimensional representation. 

Observe that the dimension k does not play an important role here: the vec- 
tor components can be included successively up to a A; such that Sk > Sfc+i- 
We remark that the singular vectors can arbitrarily be chosen in the isotropic 
subspaces corresponding to possible multiple singular values, under the orthog- 
onality conditions. Further, provided that 1 is a single singular value, the first 
vector components are the constantly 1 vectors in R" and W^, respectively, 
and hence, the /c-dimensional representation is realized in a (A; — 1) -dimensional 
hyperplane of R'^. 

A symmetric contingency table corresponds to a weighted graph, and our 
correspondence matrix is the identity minus the normalized Laplacian, called 
normalized modularity matrix in [5]. In another view, a contingency table can 
be considered as part of the weight matrix of a bipartite graph on vertex set 
Row U Col. However, it would be tedious to always distinguish between these 
two types of vertices, we rather use the framework of correspondence analysis, 
and formulate our statements in terms of rows and columns. 



3 Normalized two-way cuts of contingency tables 

Given the n x m contingency table C on row set Row and column set Col, 
further, an integer k {0 < k < r),we want to simultaneously partition its rows 
and columns into disjoint, nonempty subsets 

Row = i?i U • • • U CoZ = Ci U • • • U Cjfc 

such that the cuts c{Ra, Cb) = Y^i^Ra, ^j&Cb ^ij ^ — 1; • • • > ^) between the 
row-column cluster pairs be as homogeneous as possible. For this requirement, 
the following so-called normalized two-way cut of the contingency table with 
respect to the above ^-partitions Prow = {Ri, ■ ■ ■ , Rk) and Pcoi = (Ci, . . . , Ck) 
of its rows and columns and the collection of signs a is defined as follows: 

t^lt^iyol{Ra) Vol(a) ^V0l(K)V0l(a) j 

where 

m 11 
V0l(i?a) = drow,i = J2 Vol(Cb) = ^ d^^lj = J2 

ieRa ieRaj=i jeCb jeCbi=i 
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are volumes of the clusters, 5ab is the Kronecker delta, and the sign aab is 
equal to 1 or -1 (it only has relevance in the a = b case, when it helps bal- 
ancing between the volumes of the same index row and column clusters), 
c '■— {cii, . . . , CTfcfc). We want to minimize the above normalized two-way cut 
with respect to all possible A;-partitions Vrow,k and Vcoi,k of the rows and 
columns, further, to a, simultaneously. The objective function penalizes row- 
and column clusters of extremely different volumes in the a case, whereas 
in the a = b case moderates the balance between Vol(i?a) and Vol(Ca). 

Definition 2 The normalized two-way cut of the contingency table C is 

I/fc(C) = min iyk{Prow:Pcoh(^)- 

Theorem 3 Let 1 — si > S2 ■ ■ ■ > Sr be the positive singular values of the 
correspondence matrix belonging to the non- decomposable contingency table C 
of rank r, and k < r be a positive integer. Then 

k 

Uk{C) > 2k-Y,Si. 

i=l 



PROOF. We will show that I'kiProw, PcoU c) is Qk in the special representa- 
tion, where the column vectors of X and Y are partition vectors belonging 
to Prow and PcoU respectively. Therefore, the statement follows, as the over- 
all minimum is 2k — Y^\=i Si- Indeed, let the ith coordinate of the left vector 
component be 

Xia '■= I = if i G Ra, Of = 1, . . . k] 

VV0l(i?a) 

similarly, let the jth coordinate of the right vector component be 
Vjb = CTbb , ^ = if j eCb, b^l,...,k, 

Vvoi(a) 

otherwise the coordinates are zeros. With this, the matrices X and Y satisfy 
the conditions imposed on the representatives, further 

II l|2 1,1, '^(^bb^ab .r ■ ^ r> ■ ^ ri 

Vi-^A = „ N + „ N + / " ieRa,3eCb. 

V0l{Ra) V0l{Cb) Jvol{Ra)^ol{Cb) 



In case of a symmetric contingency table (weight matrix W of an edge- 
weighted graph), we get the same result with the representation based on 
the eigenvectors belonging to the largest absolute value eigenvalues of the ma- 
trix D~-'^/^WD~-^/^, where D = T>row — ^coh see [5]. However, ^^{Proun PcoU 
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cannot always be directly related to the normalized cut, except the following 
two special cases. 

• When the k — 1 largest absolute value eigenvalues of the normahzed mod- 
ularity matrix are all positive, or equivalently, if the k smallest eigenvalues 
(including the zero) of the normalized Laplacian matrix are farther from 1 
than any other eigenvalue which is greater than 1. In this case the k — 1 
largest singular values (apart from the 1) of the correspondence matrix are 
identical to the k — 1 largest eigenvalues of the normalized modularity ma- 
trix, and the left and right singular vectors are identical to the corresponding 
eigenvector with the same orientation. Consequently, for the /c-dimensional 
row- and column-representatives = Cj {i = l,...,n = m) holds. With 
the choice abb = 1 {i> = l,---,^), the corresponding z/fc(C) is twice the 
normalized cut of our weighted graph in which weights of edges within the 
clusters do not count. In this special situation, the normalized two-way cut 
also favors /c-partitions with low inter-cluster edge-densities (consequently, 
intra-cluster densities tend to be large, as they do not count in the objective 
function) . 

• When the k — 1 largest absolute value eigenvalues of the normahzed mod- 
ularity matrix are all negative, then rj = — Cj for all {k — 1) -dimensional 
row and column representatives, and any (but only one) of them can be 
the corresponding vertex representative. Now t'fc(C), which is attained with 
the choice abb = —1 {b = I, . . . ,k), differs from the normalized cut in that 
it also counts the edge-weights within the clusters. Indeed, in the a — h, 

Ra^Ca^ Va CaSC 

II 112 _ 1 1 2 _ 4 

~ Voi(K) ^ Voi(H) ^ ^voi(K)Voi(H) ~ 

ii i,j e Va- Here, by minimizing the normalized k-way cut, rather a so- 
called anti-community structure is detected in that c{Ra, Co) — c(T4, Va) is 
suppressed to compensate for the term ^^^^ ^ . 

We remark that Ding et al. [9] treat this problem for two row- and column- 
clusters and minimize another objective function such that it favors 2-partitions 
where c(i?i, C2) and c(i?2, Ci) are small compared to c(i?i, Ci) and c{R2, 6*2)- 
The solution is also given by the transformed V2, U2 pair. However, it is the ob- 
jective function Qk which best complies with the SVD of the correspondence 
matrix, and hence, gives the continuous relaxation of the normalized cut min- 
imization problem. The idea of Ding et al. could be naturally extended to 
the case of several, but the same number of row and column clusters, and it 
may work well in the keyword-document classification problem. Though, in 
some real-life problems, e.g., clustering genes and conditions of microarrays, 
we rather want to find clusters of similarly functioning genes that equally 
(not especially weakly or strongly) influence conditions of the same cluster. 
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Dhillon [8] also suggests a multipartition algorithm that runs the k-means 
algorithm simultaneously for the row and column representatives. 



4 Regular row-column cluster pairs 

Let us start with the one-cluster case. Let C be an n x m contingency table 
and let Ccorr be the correspondence matrix belonging to it. The Expander 
Mixing Lemma for edge-weighted graphs naturally extends to this situation, 
see the following result of [7] . 

Proposition 4 Let C be a non- decomposable contingency table (i.e., CC^ is 
irreducible) on row set Row and column set Col, and of total volume 1. Then 
for all R C Row and C (Z Col 

\c{R, C) - Vol(i?)Vol(C)| < S2^Vol{R)Vol{C), 
where S2 is the largest but 1 singular value of the normalized contingency table 

^corr- 

Since the spectral gap of Ccorr is 1 — ^2, in view of the above Expander Mixing 
Lemma, 'large' spectral gap is an indication that the weighted cut between any 
row and column subset of the contingency table is near to what is expected in 
a random table. The following notion of discrepancy is just measures the devi- 
ation from this random situation. The discrepancy (see [7]) of the contingency 
table C of total volume 1 is the smallest a > such that for all R C Row and 
C C Col 

\c{R, C) - Vol(i?)Vol(C)| < Q;^Vol(i?)Vol(C). 

In view of this, the result of Theorem 4 can be interpreted as follows: a singular 
value separation causes a discrepancy, where the singular value separation is 
the second largest singular value of the normalized contingency table, which is 
the smaller the bigger the separation between the largest singular value (the 
1) of the normalized contingency table and the other singular values is. Based 
on the ideas of [2] and [6], Butler [7] proves the converse of the Expander 
Mixing Lemma for contingency tables, namely that 

S2 < 150q;(1 -Sloga). 

Now we extend the notion of discrepancy to volume-regular pairs. 

Definition 5 The row-column cluster pair R C Row, C (Z Col of the con- 
tingency table C of total volume 1 is ^-volume regular if for all X <Z R and 
Y d C the relation 

\c{X, Y) - p{R, C)Vol(X)Vol(r)| < 7Y^Vol(i2)Vol(C) (5) 
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holds, where p{R, C) = voi(i^vo'i(C) relative inter-cluster density of the 

row-column pair R, C. 

Now we will show that for given fc, if the clusters are formed via applying the 
weighted /c-means algorithm for the optimal row- and column representatives, 
respectively, then the so obtained row-column cluster pairs are homogeneous 
in the sense that they form equally dense parts of the contingency table. More 
precisely, the constant 7 of the volume regularity of the pairs will be related 
to the SVD of Cccfrr- To this end, we introduce the following notion. 

The weighted /c-variance of the /c-dimensional row representatives is defined 

by 

k 

Si (X) = min E drowj 1 1 r^- - f „ 1 1 ^ (6) 

where = ^^^^ ^ ^j&Ra ^^rowj^j is the weighted center of cluster Ra (a = 
1, . . . ,k). Similarly, the weighted A;- variance of the fc-dimensional column rep- 
resentatives is 

k 

= min Yl E dcoi,j\\cj - Caf, (7) 

where = voilc ) ^j&Ca ^coi,j^j is the weighted center of cluster Ca {a — 
1, . . . ,k). Observe, that the trivial vector components can be omitted, and the 
fc-variance of the so obtained {k — 1) -dimensional representatives will be the 
same. 

Definition 6 The cut-norm of the rectangular real matrix A with row-set 
Row and column-set Col is 



|A||n = max 

RCRow,CcCol 



EE 



Lemma 7 For the cut-norm of the n x m real matrix A 

\\A\\a < i/nmll A|| 

holds, where the right hand side contains its spectral norm, i.e., the largest 
singular value of A. 



PROOF 

l|A||n = 
< 



max |x"^Ax| = max 
xe{o,i}", ye{o,i}"' xe{o,i}", ye{o,i}'' 

Jnm max Ix'^AxI = \Aim||A||, 

llx|l=l,||y||=l' 



X X 



• • ||y| 
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since for x e {0, 1}", ||x|| < y/n, and for y e {0, 1}"*, ||y|| < s/m. 



Theorem 8 Let C be a non-decomposable contingency table of n-element 
row set Row and m-element column set Col, with row- and column sums 
drow,i, ■■■■ drow,n and dcoi,i, dcoi,m, respectively. Suppose that J27=i EJLi Cij = 
1 and there are no dominant rows and columns: drow,i = ^O-fn), {i = 1, . . . ,n) 
and dcoi,j = Q{l/m), {j = 1, . . . ,m) as n, m —>■ oo. Let the singular values of 

^corr be 

1 — Si > S2 > ■ ■ ■ > Sk > e > Si, i > k -\- 1. 

The partition {Ri, . . . , Rk) of Row and (Ci, . . . , Ck) of Col are defined so 
that they minimize the weighted k-variances Sl(K) and -S'I(Y) of the row and 

column representatives defined in (6) and (7), respectively. Suppose that there 
are constants < Ki,K2 < I such that \Ri\ > Kin and \Ci\ > {i = 

l,...,k), respectively. Then the Ri,Cj pairs are 0{\/2k{Sk(X.)Sk(y)) -\- e)- 
volume regular {i,j = 1, . . . ,k). 



PROOF. Recall that provided C is non-decomposable, the largest singu- 
lar value Si = 1 of Ccorr is single with corresponding singular vector pair 
Vi = and Ui = I^lgfl with the constantly 1 vectors of appropriate 

size. The optimal A;- dimensional representatives of the rows and columns are 
row vectors of the matrices X = (xi, . . . ,Xfc) and Y = (yi, . . . ,yfc), where 
Xj = D^Q^^^Vj and y^ = D^J'^^Uj, respectively {i = 1, . . . ,k). Suppose that the 
minimum A;- variance is attained on the /c-partition {Ri, . . . , R^) of the rows 
and (Ci, . . . , Cfc) of the columns. By an easy analysis of variance argument 
of [3] it follows that 

Sl{^) = i:dist^(v,,F), Sl{Y)^j:<i±st\n,,Gl 

i=l 1=1 

where F = Span{DV>i, . . . , DV>fe} and G = Span {D^/fz^, . . . , D^/fz^} 
with the so-called normalized row partition vectors wi, . . . , of coordinates 
Wji — I ^ if 7 e Ri and 0, otherwise, and column partition vectors 

Zi, . . . , Zjt of coordinates Zji = , ^ if j e Cj and 0, otherwise (i = 1, . . . , A;). 

y Vol(Ci) 

Note that the vectors D^/^wi, . . . , D^/^Wjt and D^^f zi, . . . , D^^f Zfc form or- 
thonormal systems in M" and R"*, respectively (but they are, usually, not 
complete). By [3], we can find orthonormal systems Vi,...,Vfe e F and 
Ui, . . . , Ufc e G such that 



< E - v.ll' < 2Sl{X), SliY) < \\ui - Uif < 2Sl{Y) 

i=l i=l 
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We approximate the matrix Ccorr — Yll=i SjVjuf by the rank k matrix Yli=i SiVjuf 
with the following accuracy (in spectral norm): 



i=l 



i=l 



i=l 



+ 



Y ^i^i^^ 
=k+l 



(8) 



where the spectral norm of the last term is at most e, and the the individual 
terms of the first one are estimated from above in the following way. 



< 



fv,- - Vilu'T' 



|v,-fu,- 



u, 



where we exploited that the spectral norm (i.e., the largest singular value) of an 
nxm matrix A is equal to either the squareroot of the largest eigenvalue of the 
matrix AA^ or cquivalently, that of A^A. In the above calculations all of these 
matrices are of rank 1, hence, the largest eigenvalue of the symmetric, positive 
semi definite matrix under the squareroot is the only non-zero eigenvalue of it, 
therefore, it is equal to its trace; finally, we used the commutativity of the 
trace, and in the last line we have the usual vector norm. 

Therefore the first term in (8) can be estimated from above by 



K K 

Y llviuf - Viufii < v^A Y h- 

i=l \ i=l 



- V,; 



\ Y W^i - "ilP 

\ i=l 



< VkipSlpq + pSl{Y)) = V2k{S,{X) + S,{Y)). 



Based on these considerations and relation between the cut norm and the 
spectral norm (see Lemma 7), the densities to be estimated in the defining for- 
mula (5) of volume regularity can be written in terms of stepwise constant vec- 
tors in the following way. The vectors Vj := D~p^^Vi are stepwise constants on 
the partition {Ri, . . . , Rf.) of the rows, whereas the vectors Uj D^^/ Uj are 
stepwise constants on the partition (Ci, . . . , Ck) of the columns, i — 1, . . . ,k. 
The matrix 

k 

Y ^iViuf 

i=l 

is therefore an n x m block-matrix on k x k blocks belonging to the above 
partition of the rows and columns. Let Cab denote its entries in the a, b block 
{a,b — l,...,k). Using (8), the rank k approximation of the matrix C is 
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performed with the following accuracy of the perturbation E in spectral norm: 



lEl 



col 



1=1 



Bl/^{C^rr-T.^^^i^^)'^cof 



Therefore, the entries of C - for i e i?a, J G C'fe - can be decomposed as 

Cij — drcmi,idcol,jCab ~\~ Vij: 

where the cut norm of the n x m error matrix E = (riij) restricted to Ra x Cb 
(otherwise it contains entries all zeroes) and denoted by Eab, is estimated as 
follows: 



lEablln < VmnllEabll < Vmn ■ \\Dlil, 
< \fmn^ Ci 



2M^,(X) + 5,(Y)) + £)-||D;/i 



Vol(i?,) 



\ 



C2^^°^(V2^(5,(X) + 5,(Y)) + E) 



Cb 



n 

\Ra\ 



m 
\Cb\ 



■ ^Vol(i?„)y'vol(a)(V^(S\.(X) + Sk{Y)) + e) 



< 



I C1C2 
K1K2 



Vol{Ra)jMol(Cb)(V2ks + e) 



= c^^ol{Ra)^Vol{Cb){V2k{Sk{'K) + Sk{Y)) + e), 

where the n x n diagonal matrix T>row,a inherits 'Drow^^ diagonal entries over 
Ra, whereas the m x m diagonal matrix Dcoi,b inherits Dco^'s diagonal entries 

over Cb, otherwise they are zeros. Further, the constants Ci,C2 are due to the 
fact that there are no dominant rows and columns, while Ki,K2 are derived 
from the cluster size balancing conditions. Hence, the constant c does not 
depend on n and m. Consequently, for a, 6 = 1, . . . , A; and X C Ra, Y C C^: 

\c{X,Y)-p{Ra,Cb)^ol{X)Nol{Y)\ = 

^ _ voi(x)voi(r) ^ 

/ . / . ['^row,idcol,jCab + Vij ) -, / p Nir^T f n \ ^ ^ {drow,idcol,jCab + Vij ) 



< 2||Eafe||n 



, voi(x)voi(y) 

< 2c{V2k{Sk{X) + Sk{Y)) + £)^Vol(i?„)Vol(a), 
that gives the required statement for a, 6 = 1, . . . , A;. 



Note that when we use Definition 5 of 7- volume regularity for the row-column 
cluster pairs Ri, Cj {i, j = 1, . . . , k), then we may say that the k-way discrep- 
ancy of the underlying contingency table is the minimum 7 for which all 
the row-column cluster pairs are 7- volume regular. With this nomenclature. 
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Theorem 8 states that the fc-way discrepancy of a contingency table can be 
estimated from above by the the {k + l)th largest singular value of the corre- 
spondence matrix and the /c- variance of the clusters obtained by the left and 
right singular vectors corresponding to the k largest singular values of this 
matrix. Hence, SVD based representation is applicable to find volume regular 
cluster pairs for given k, where k is the number of structural (protruding) 
singular values. 



5 Discussion, application, and extension to directed graphs 



In the ideal A;-cluster case, we consider the following generalized random bi- 
nary contingency table model: given the partition . . . ,Rk) of the rows 
and (Ci, . . . , Cfc) of the columns, the entry in the row i & Ra and column 
j e Cft is 1 with probability Pab, and otherwise, independently of other 
rows of Ra and columns of Cb, 1 < a,b < k. We can think of the probability 
Pab as the inter-cluster density of the row-column cluster pair Ra,Cb. Since 
generalized contingency tables can be viewed as block-matrices (with k x k 
blocks) burdened with a general random noise, in [4], we gave the following 
spectral characterization of them. Fixing k, and tending with n and m to in- 
finity in such a way that the cluster sizes grow at the same rate and also n 
and m subpolynomially, there exists a positive number 6 < 1, independent of 
n and m, such that for every < r < 1/2 there are exactly k singular values 
of Ccorr greater than 9 — max{n^^, m^^}, while all the others are at most 
maxln""^, m""^}; further, the weighted A;-variance of the row and column rep- 
resentatives constructed by the k transformed structural left and right singular 
vectors is 0(max{n~'^, m~'^}), respectively. 

For general contingency tables, our result is that the existence of k singular 
values of Ccorr, Separated from by e, is indication of a /c-cluster structure, 
while the eigenvalues accumulating around are responsible for the pairwise 
regularities. The clusters themselves can be recovered by applying the fc-means 
algorithm for the row and column representatives obtained by the left and right 
singular vectors corresponding to the structural singular values. 

We applied the biclustering algorithm to find simultaneously clusters of stores 
and products based on their consumption in TESCO stores. Figure 1 shows 
3 clusters of the stores in which the consumption of the products belonging 
to the same cluster was homogeneous with consumption-density voi^^i^jvoilc^) 
between store-cluster Ra and product-chistcr Cb {a,b = 1, . . . , 3). After sort- 
ing the rows and columns according to their cluster memberships, we plot- 
ted the entries — — (there was one exceptional store-cluster which con- 

^row ,if^col,j 

tained only 3 stores, but the others could be identified with groups of smaller 
and larger stores associated with product groups of high consumption-density 
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within them). 




Fig. 1. Result of biclustering 119 stores and 719 products into 3 clusters 



We can consider quadratic, but not symmetric contingency tables with zero 
diagonal as edge-weight matrices of directed graphs. The n x n edge-weight 
matrix W of a directed graph has zero diagonal, but is usually not symmetric: 
Wij is the weight of the i ^ j edge {i,j = l,...,n;iy^ j). In this setup, the 
generalized in- and out-degrees are 

n n 

dout,i = ^u!ij (i = 1, . . . , n) and dinj = Wy (j = 1, . . . , n); 

j=i i=i 

further, Dj„ = diag {din,i, . . . , dj„,„) and Bout = diag {dout,i, • • • , dout,n) are 
the in- and out-degree matrices. Suppose that there are no sources and sinks 
(i.e. no zero out- and in-degrees), further, that W is non-decomposable. Then 
the correspondence matrix belonging to W is 

W = D^V^WD^"^^^ 

corr '-'out ''-"--^m ; 

and its SVD is used to minimize the normalized two-way cut of W as a 
contingency table, see Section 3. Butler [7] generalized the Expander Mixing 
Lemma for this situation. We can further generalize it to obtain regular in- 
and out-vertex cluster pairs, for a given k, in the following sense. The Vm, Vout 
in- and out- vertex cluster pair of the directed graph (with sum of the weights 
of directed edges 1) is 7- volume regular if for all X C Vout and Y C Vin the 
relation 

\W{X,Y) - p{Vout,Vin)V0loutiX)Y0liniY)\ < 7^Volo„t (K„t ) Voli,, ( V^^) 

holds, where the directed cut w{X, Y) is the sum the weights of the X — )■ 
Y edges, Vol„„t(X) = J2iex dout,i, Volj„(F) = Y.j&din,j, and p{Vout,Vin) = 
— — ^,iY°''tY'"^ nr N is the relative inter-cluster density of the out-in cluster pair 

Vout, Vin- The clustering (Vi„,i, . . . , Vin,k) and [Vout,!, • • • , Vout,k) of the columns 
and rows - guaranteed by Theorem 8 - corresponds to in- and out-clusters of 
the same vertex set such that the directed information flow Vout,a ^n,b is 
as homogeneous as possible for all a, 6 = 1, . . . , /c pairs. 
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